noisy observation
Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions
This paper relaxes the restrictive symmetry conditions adopted in [4], [5] and extends their universal feature selection framework to accommodate noisy observations as well as attribute structures that may exhibit directional preferences. We introduce the notion of weak spherical symmetry, quantified by second-moment distances, which allows controlled deviations from rotational invariance. Under this relaxed condition, we develop a universal feature selection framework based on the singular value decomposition of the canonical dependence matrix computed from noisy data. Our main result shows that the selected features achieve asymptotically optimal error exponents up to a residual term that depends on the symmetry deviation $δ$ and the noise levels $η_1, η_2$. When $δ, η_1, η_2$ are relatively small, our result recovers that of [5], thereby demonstrating that exact spherical symmetry is unnecessary. Overall, our findings highlight the robustness of the selection framework against second-moment deviations and observation noise, thereby broadening its applicability across diverse inference tasks and providing a theoretically grounded tool for universal feature selection in practical scenarios.
Supplementary Material
We say a real-valued random variable X is -sub-Gaussian if it its mean is zero and for all " 2 R we have E[exp("X)] exp Such assumptions on the noise variables are frequently used in bandit optimization. Typically, in kernelized bandits, we assume that unknown f 2F k(D;B)= {f 2H k(D): kfkk B}, where Hk(D) is the reproducing kernel Hilbert space of functions associated with the given positive-definite kernel function. Typically, the learner knows Fk(D;B), meaning that both k(,) and B are considered as input to the learner's algorithm. We outline some commonly used kernel functions k: D D! R, that we also consider: Linear kernel: klin(x,x0)= xTx0, Squared exponential kernel: kSE(x,x0)=exp kx x0k2 2l2, Matérn kernel: kMat(x,x0)= 2 Maximum information gain is a kernel-dependent quantity that measures the complexity of the given function class. It has first been introduced in [40], and since then it has been used in numerous works on Gaussian process bandits.
Misspecified Gaussian Process Bandit Optimization
We consider the problem of optimizing a black-box function based on noisy bandit feedback. Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. They heavily rely on the assumption that the model is well-specified, however, and can fail without it. Instead, we introduce a misspecified kernelized bandit setting where the unknown function can be -uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS).
Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement
Optimizing multiple competing black-box objectives is a challenging problem in many fields, including science, engineering, and machine learning. Multi-objective Bayesian optimization (MOBO) is a sample-efficient approach for identifying the optimal trade-offs between the objectives. However, many existing methods perform poorly when the observations are corrupted by noise. We propose a novel acquisition function, NEHVI, that overcomes this important practical limitation by applying a Bayesian treatment to the popular expected hypervolume improvement (EHVI) criterion and integrating over this uncertainty in the Pareto frontier. We argue that, even in the noiseless setting, generating multiple candidates in parallel is an incarnation of EHVI with uncertainty in the Pareto frontier and therefore can be addressed using the same underlying technique. Through this lens, we derive a natural parallel variant, qNEHVI, that reduces computational complexity of parallel EHVI from exponential to polynomial with respect to the batch size.
Label Noise Cleaning for Supervised Classification via Bernoulli Random Sampling
Liu, Yuxin, Jin, Xiong, Han, Yang
Label noise - incorrect labels assigned to observations - can substantially degrade the performance of supervised classifiers. This paper proposes a label noise cleaning method based on Bernoulli random sampling. We show that the mean label noise levels of subsets generated by Bernoulli random sampling containing a given observation are identically distributed for all clean observations, and identically distributed, with a different distribution, for all noisy observations. Although the mean label noise levels are not independent across observations, by introducing an independent coupling we further prove that they converge to a mixture of two well-separated distributions corresponding to clean and noisy observations. By establishing a linear model between cross-validated classification errors and label noise levels, we are able to approximate this mixture distribution and thereby separate clean and noisy observations without any prior label information. The proposed method is classifier-agnostic, theoretically justified, and demonstrates strong performance on both simulated and real datasets.
Supplementary Material Misspecified GP Bandit Optimization Ilija Bogunovic and Andreas Krause (NeurIPS 2021) A GP bandits: Useful definitions and auxiliary results (Realizable setting)
Such assumptions on the noise variables are frequently used in bandit optimization. Gaussian process with posterior mean and variance that correspond to Eq. (8) and Eq. It also allows us to rewrite Eq. Gaussian Process (supported on D) with the corresponding kernel function. Suppose the learner's hypothesis class is While the first two terms in this bound can be effectively controlled and bounded as in the proof of Theorem 1, the last term, i.e., Such a function can easily be constructed, e.g., via the approach outlined in [36].
Pathwise Learning of Stochastic Dynamical Systems with Partial Observations
The reconstruction and inference of stochastic dynamical systems from data is a fundamental task in inverse problems and statistical learning. While surrogate modeling advances computational methods to approximate these dynamics, standard approaches typically require high-fidelity training data. In many practical settings, the data are indirectly observed through noisy and nonlinear measurement. The challenge lies not only in approximating the coefficients of the SDEs, but in simultaneously inferring the posterior updates given the observations. In this work, we present a neural path estimation approach to solve stochastic dynamical systems based on variational inference. We first derive a stochastic control problem that solve filtering posterior path measure corresponding to a pathwise Zakai equation. We then construct a generative model that maps the prior path measure to posterior measure through the controlled diffusion and the associated Randon-Nykodym derivative. Through an amortization of sample paths of the observation process, the control is learned by an embedding of the noisy observation paths. Thus, we learn the unknown prior SDE and the control can recover the conditional path measure given the observation sample paths and we learn an associated SDE which induces the same path measure. In the end, we perform experiments on nonlinear dynamical systems, demonstrating the model's ability to learn multimodal, chaotic, or high dimensional systems.